142
Table 11.1 Molecular biology foci and important databases and software
Molecular biology
focus
Important databases/software
How do cells read their
genome?
NCBI, EMBL, EBI, BLAST, Rfam, RNAAnalyzer, RNAfold, SMART,
PDB, SCOP, CATH, ProDom, Pfam
How do cells control
gene expression?
GEO, GENEVESTIGATOR, cBioPortal, TCGA, TESS, ALGGEN
PROMO, Genomatix, MEME Suite, iRegulon, miRanda, TargetScan,
STRING, KEGG, Roche Biochemical Pathways, STITCH, DrumPID
How cells localize,
transport and secrete
proteins
KEGG, PyMOL, RasMol, Ramachandran Plot, ELM Server, TMHMM
How cells build a solid
skeleton and move
actively
ExPASy, PROSITE, ProDom, PlateletWeb, MUSCLE, EMA, Metatool,
YANAsquare
How do cells
communicate?
STRING, iHOP, PRODORIC, SQUAD, Jimena, SWISS-MODEL,
I-TASSER, LOMETS, QUARK, Rosetta
on a given function or sequence is provided by domain databases; in addition to SMART
and EMBL, ProDom and Pfam are particularly important. The three-dimensional structure
of many proteins is stored in the protein structure database PDB, details of the architecture
in the structure databases SCOP and CATH.
How Do Cells Control Gene Expression?
Interestingly, at any given moment, only a fraction of the genome information is translated
into RNA molecules. The question is: How do I quickly find out bioinformatically which
RNA is synthesized in which cell type? For this purpose, the GEO (Gene Expression
Omnibus) database is good, which holds numerous data from gene expression experiments
for different organisms, tissues and diseases in detail. A similar database is
GENEVESTIGATOR. The cBioPortal and The Cancer Genome Atlas (TCGA) databases
focus on cancer. In particular, because usually all transcripts of a cell are measured, these
experiments can also be used to infer from previous data how one’s desired gene is regu
lated. For this purpose, GEO, GENEVESTIGATOR, cBioPortal and TCGA also hold sta
tistical analysis. Next, there is promoter analysis software. This allows me to determine
which regulatory sequences regulate the turning on and off of a gene. There are simple
programs for this, such as TESS or ALGGEN PROMO, which simply reveal numerous
binding sites for transcription factors, and usually far too many possibilities. In addition,
there are better, but often commercial programs such as Genomatix, which, among other
things, compare which of the many binding sites within a gene family are conserved and
thus presumably actually regulate transcription, so-called modules (e.g. consisting of three
specific transcription factors), for example to specifically transcribe liver genes, such as
Liver-specific-transcription-factor-1 modules. Ab initio approaches such as MEME Suite
and iRegulon offer another possibility to find unknown TF motifs and regulatory TF factors.
For regulation in the cell, it is also important that proteins control each other. For this,
the protein interaction database STRING (EMBL) is very good and broad (and there are
11 Design Principles of a Cell